--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: in-progress rmarkdown_html_fragment: true update: 2020-04-07 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

_Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Generalised Additive Model to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CI of fitted GAM._

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Generalised Additive Model to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CI of fitted GAM.

Current estimates

Figure 2: Plotting the estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates. Blue shading is the 2.5% - 97.5% confidence range.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Albania 11% (6.9% - 20%) 377 22
Algeria 3.9% (3.1% - 5%) 1423 173
Andorra 17% (10% - 31%) 526 21
Argentina 19% (13% - 27%) 1628 53
Australia 100% (72% - 100%) 5844 42
Austria 45% (35% - 57%) 12297 220
Bangladesh 4.7% (2.7% - 9.5%) 123 12
Belarus 16% (8.5% - 32%) 700 13
Belgium 7.7% (6.5% - 8.9%) 20814 1632
Bolivia 7.8% (4.4% - 15%) 194 14
Bosnia and Herzegovina 15% (9.2% - 24%) 695 28
Brazil 11% (9.1% - 13%) 12056 553
Bulgaria 18% (11% - 32%) 549 22
Burkina Faso 14% (7.8% - 25%) 364 17
Canada 26% (21% - 33%) 16653 323
Chile 73% (47% - 100%) 4815 37
China 33% (29% - 38%) 82698 3335
Colombia 20% (14% - 30%) 1579 46
Croatia 52% (29% - 100%) 1222 16
Cyprus 20% (11% - 40%) 465 14
Czechia 42% (30% - 59%) 4822 78
Democratic Republic of the Congo 5.6% (3.4% - 9.8%) 161 18
Denmark 17% (13% - 21%) 4681 187
Dominican Republic 12% (8.8% - 17%) 1828 86
Ecuador 13% (9.9% - 16%) 3747 191
Egypt 9% (6.6% - 12%) 1073 78
Estonia 42% (24% - 76%) 1108 19
Finland 53% (33% - 89%) 2176 27
France 5.6% (4.9% - 6.4%) 74390 8911
Germany 42% (36% - 49%) 99225 1607
Greece 17% (12% - 24%) 1755 79
Honduras 7.8% (4.8% - 13%) 305 22
Hungary 11% (7.4% - 16%) 817 47
India 16% (12% - 21%) 4421 114
Indonesia 7.3% (5.8% - 9.2%) 2491 209
Iran 12% (11% - 14%) 60500 3739
Iraq 10% (7.4% - 15%) 1031 64
Ireland 18% (14% - 24%) 5364 174
Israel 90% (62% - 100%) 8904 57
Italy 6.7% (5.9% - 7.6%) 132547 16525
Japan 30% (22% - 42%) 3817 80
Lebanon 25% (15% - 46%) 541 19
Lithuania 38% (21% - 77%) 843 14
Luxembourg 52% (35% - 81%) 2843 41
Malaysia 46% (32% - 67%) 3793 62
Mexico 9.7% (7.3% - 13%) 2439 125
Moldova 21% (12% - 38%) 965 19
Morocco 7.1% (5.2% - 9.8%) 1120 80
Netherlands 6.7% (5.7% - 7.7%) 18803 1867
North Macedonia 15% (9.2% - 27%) 570 21
Norway 79% (55% - 100%) 5755 59
Pakistan 36% (25% - 53%) 3864 54
Panama 22% (15% - 32%) 2100 55
Peru 13% (9.2% - 17%) 2561 92
Philippines 12% (9.3% - 16%) 3660 163
Poland 22% (16% - 30%) 4413 107
Portugal 23% (18% - 28%) 11730 311
Puerto Rico 10% (6.2% - 18%) 513 21
Romania 14% (11% - 18%) 4057 157
Russia 53% (36% - 80%) 6343 47
San Marino 8.2% (5.4% - 13%) 277 32
Saudi Arabia 41% (27% - 64%) 2523 38
Serbia 16% (11% - 24%) 2200 58
Slovenia 28% (18% - 46%) 1021 30
South Africa 100% (55% - 100%) 1686 12
South Korea 64% (50% - 83%) 10331 192
Spain 7.1% (6.2% - 8%) 135032 13055
Sweden 10% (8.2% - 12%) 7206 477
Switzerland 28% (23% - 34%) 21574 584
Thailand 64% (39% - 100%) 2258 26
Tunisia 16% (9.4% - 27%) 574 22
Turkey 20% (17% - 24%) 30217 649
Ukraine 16% (10% - 24%) 1319 38
United Arab Emirates 70% (35% - 100%) 2076 11
United Kingdom 4.9% (4.2% - 5.5%) 51608 5373
United States of America 17% (15% - 19%) 368196 10989

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Generalised Additive Model (GAM) of the form \[ \mathbb{E}[\log(D)] = \beta_0 + \beta_1 x_1 + ... + \beta_p x_p,\] specifying a Poisson distribution on deaths (D) as the response variable. The model has a log-link function and a log-offset (\(\kappa\)) consisting of the daily known-outcomes \(u_t\) and the cCFR estimate for that country on that day \(\text{cfr}_t\). The model can then be written as \[ D \sim s(t) + \underbrace{\log(u_t c_t) + \log(\text{cfr}_t)}_{:=log(κ)} \] where \(s(t)\) is a smoothing spline, fitted through the time points (days) for which we have data.

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.